Skip to content

fix(parse): use safe_extract_zip for UnderstandingAPI zip download to close Zip Slip#2634

Open
r266-tech wants to merge 1 commit into
volcengine:mainfrom
r266-tech:fix-understanding-api-zipslip
Open

fix(parse): use safe_extract_zip for UnderstandingAPI zip download to close Zip Slip#2634
r266-tech wants to merge 1 commit into
volcengine:mainfrom
r266-tech:fix-understanding-api-zipslip

Conversation

@r266-tech

Copy link
Copy Markdown
Contributor

What

UnderstandingAPI (added in #2614) extracts the third-party-downloaded result zip with a raw zipfile.ZipFile(...).extractall(), bypassing the project's canonical safe_extract_zip Zip-Slip guard.

with tempfile.TemporaryDirectory() as extract_dir:
    with zipfile.ZipFile(zip_path, "r") as zf:
        zf.extractall(extract_dir)   # <- no Zip-Slip protection

Why

#2615's same-day hardening sweep routed every other production extractor through safe_extract_zip (utils/zip_safe.py, utils/skill_processor.py, upload paths), but its file list didn't include understanding_api.py, so this one download path still calls extractall(). A malicious or MITM'd Understanding-API result zip whose members use ../../… would currently escape the temp dir and write onto the user's filesystem. The safe_extract_zip invariant is deliberately source-agnostic, so even a configured-endpoint source should go through it.

Change

One-line swap to the same helper parse/parsers/zip_parser.py already uses:

safe_extract_zip(zf, extract_dir)

No behavior change for benign archives: safe_extract_zip validates each member stays inside extract_dir and recreates the same tree the downstream extract_path.iterdir() / single-root-dir detection expects (safe_extract_zip Path()-coerces the str temp dir and calls normalize_zip_filenames internally). Zip-Slip behavior is already covered centrally by tests/misc/test_zip_safe.py.

@github-actions

Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis ✅

2615 - Fully compliant

Compliant requirements:

  • Uses project's canonical safe_extract_zip which enforces safe ZIP extraction (rejects traversal, absolute, drive-prefixed entries, normalizes separators)
⏱️ Estimated effort to review: 1 🔵⚪⚪⚪⚪
🏅 Score: 98
🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions

Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant